NSF PAR Search | NSF Public Access Repository

Note: When clicking on a Digital Object Identifier (DOI) number, you will be taken to an external site maintained by the publisher. Some full text articles may not yet be available without a charge during the embargo (administrative interval).
What is a DOI Number?

Some links on this page may take you to non-federal websites. Their policies may differ from this site.

Smoothed Analysis for Learning Concepts with Low Intrinsic Dimension

Chandrasekaran, G; Klivans, A; Kontonis, V; Meka, R; Stavropoulos, K (April 2025, https://doi.org/10.48550/arXiv.2407.00966)

In traditional models of supervised learning, the goal of a learner -- given examples from an arbitrary joint distribution on ℝd×{±1} -- is to output a hypothesis that is competitive (to within ϵ) of the best fitting concept from some class. In order to escape strong hardness results for learning even simple concept classes, we introduce a smoothed-analysis framework that requires a learner to compete only with the best classifier that is robust to small random Gaussian perturbation. This subtle change allows us to give a wide array of learning results for any concept that (1) depends on a low-dimensional subspace (aka multi-index model) and (2) has a bounded Gaussian surface area. This class includes functions of halfspaces and (low-dimensional) convex sets, cases that are only known to be learnable in non-smoothed settings with respect to highly structured distributions such as Gaussians. Surprisingly, our analysis also yields new results for traditional non-smoothed frameworks such as learning with margin. In particular, we obtain the first algorithm for agnostically learning intersections of k-halfspaces in time kpoly(logkϵγ) where γ is the margin parameter. Before our work, the best-known runtime was exponential in k (Arriaga and Vempala, 1999).
more » « less
Free, publicly-accessible full text available April 30, 2026
Learning Neural Networks with Sparse Activations

Awasthi, P; Dikkala, N; Kamath, P; Meka, R (June 2024, https://doi.org/10.48550/arXiv.2406.17989)

A core component present in many successful neural network architectures, is an MLP block of two fully connected layers with a non-linear activation in between. An intriguing phenomenon observed empirically, including in transformer architectures, is that, after training, the activations in the hidden layer of this MLP block tend to be extremely sparse on any given input. Unlike traditional forms of sparsity, where there are neurons/weights which can be deleted from the network, this form of {\em dynamic} activation sparsity appears to be harder to exploit to get more efficient networks. Motivated by this we initiate a formal study of PAC learnability of MLP layers that exhibit activation sparsity. We present a variety of results showing that such classes of functions do lead to provable computational and statistical advantages over their non-sparse counterparts. Our hope is that a better theoretical understanding of {\em sparsely activated} networks would lead to methods that can exploit activation sparsity in practice.
more » « less
Full Text Available
On Convex Optimization with Semi-Sensitive Features

Ghazi, B; Kamath, P; Kumar, R; Manurangsi, P; Meka, R; Zhang, C (June 2024, https://doi.org/10.48550/arXiv.2406.19040)

We study the differentially private (DP) empirical risk minimization (ERM) problem under the semi-sensitive DP setting where only some features are sensitive. This generalizes the Label DP setting where only the label is sensitive. We give improved upper and lower bounds on the excess risk for DP-ERM. In particular, we show that the error only scales polylogarithmically in terms of the sensitive domain size, improving upon previous results that scale polynomially in the sensitive domain size (Ghazi et al., 2021).
more » « less
Full Text Available
Pseudorandomness from Shrinkage

Impagliazzo, R; Meka, R; Zuckerman, D (April 2019, Journal of the ACM)

One powerful theme in complexity theory and pseudorandomness in the past few decades has been the use of lower bounds to give pseudorandom generators (PRGs). However, the general results using this hardness vs. randomness paradigm suffer from a quantitative loss in parameters, and hence do not give nontrivial implications for models where we don't know super-polynomial lower bounds but do know lower bounds of a fixed polynomial. We show that when such lower bounds are proved using random restrictions, we can construct PRGs that are essentially best possible without in turn improving the lower bounds. More specifically, say that a circuit family has shrinkage exponent Gamma if a random restriction leaving a p fraction of variables unset shrinks the size of any circuit in the family by a factor of p^{Gamma + o(1)}. Our PRG uses a seed of length s^{1/(Gamma + 1) + o(1)} to fool circuits in the family of size s. By using this generic construction, we get PRGs with polynomially small error for the following classes of circuits of size s and with the following seed lengths: 1. For de Morgan formulas, seed length s^{1/3+o(1)}; 2. For formulas over an arbitrary basis, seed length s^{1/2+o(1)}; 3. For read-once de Morgan formulas, seed length s^{.234...}; 4. For branching programs of size s, seed length s^{1/2+o(1)}. The previous best PRGs known for these classes used seeds of length bigger than n/2 to output n bits, and worked only when the size s=O(n).
more » « less
Full Text Available

Search for: All records